16 research outputs found

    Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

    Full text link
    We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, to attain a higher level of robustness and accuracy, we introduce additional sparse depth samples, which are either acquired with a low-resolution depth sensor or computed via visual Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. Our experiments show that, compared to using only RGB images, the addition of 100 spatially random depth samples reduces the prediction root-mean-square error by 50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two applications of the proposed algorithm: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs. Software and video demonstration are publicly available.Comment: accepted to ICRA 2018. 8 pages, 8 figures, 3 tables. Video at https://www.youtube.com/watch?v=vNIIT_M7x7Y. Code at https://github.com/fangchangma/sparse-to-dens

    FastDepth: Fast Monocular Depth Estimation on Embedded Systems

    Full text link
    Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors' knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.Comment: Accepted for presentation at ICRA 2019. 8 pages, 6 figures, 7 table

    Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera

    Full text link
    Depth completion, the technique of estimating a dense depth image from sparse depth measurements, has a variety of applications in robotics and autonomous driving. However, depth completion faces 3 main challenges: the irregularly spaced pattern in the sparse depth input, the difficulty in handling multiple sensor modalities (when color images are available), as well as the lack of dense, pixel-level ground truth depth labels. In this work, we address all these challenges. Specifically, we develop a deep regression model to learn a direct mapping from sparse depth (and color images) to dense depth. We also propose a self-supervised training framework that requires only sequences of color and sparse depth images, without the need for dense depth labels. Our experiments demonstrate that our network, when trained with semi-dense annotations, attains state-of-the- art accuracy and is the winning approach on the KITTI depth completion benchmark at the time of submission. Furthermore, the self-supervised framework outperforms a number of existing solutions trained with semi- dense annotations.Comment: Software: https://github.com/fangchangma/self-supervised-depth-completion . Video: https://youtu.be/bGXfvF261pc . 12 pages, 6 figures, 3 table

    FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction

    Full text link
    Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry without test-time optimization is feasible using deep neural networks, showing remarkable promise and high efficiency. However, the reconstructed geometry, typically represented as a 3D truncated signed distance function (TSDF), is often coarse without fine geometric details. To address this problem, we propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. We first present a resolution-agnostic TSDF supervision strategy to provide the network with a more accurate learning signal during training, avoiding the pitfalls of TSDF interpolation seen in previous work. We then introduce a depth guidance strategy using multi-view depth estimates to enhance the scene representation and recover more accurate surfaces. Finally, we develop a novel architecture for the final layers of the network, conditioning the output TSDF prediction on high-resolution image features in addition to coarse voxel features, enabling sharper reconstruction of fine details. Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.Comment: ICCV 202

    On maximum-reward motion in stochastic environments

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 75-77).In this thesis, we consider the problem of an autonomous mobile robot operating in a stochastic reward field to maximize total rewards collected in an online setting. This is a generalization of the problem where an unmanned aerial vehicle (UAV) collects data from randomly deployed unattended ground sensors (UGS). Specifically, the rewards are assumed to be generated by a Poisson point process. The robot has a limited perception range, and thus it discovers the reward field on the fly. The robot is assumed to be a dynamical system with substantial drift in one direction, e.g., a high-speed airplane, so it cannot traverse the entire field. The task of the robot is to maximize the total rewards collected during the course of the mission, given above constraints. Under such assumptions, we analyze the performance of a simple receding-horizon planning algorithm with respect to the perception range, robot agility and computational resources available. Firstly, we show that, with highly limited perception range, the robot is able to collect as many rewards as if it could see the entire reward field, if and only if the reward distribution is light-tailed. The second result attained shows that the expected rewards collected scale proportionally to the square root of the robot agility. Finally, we are able to prove that the overall computational workload increases linearly with the mission length, i.e., the distance of travel. We verify our results in simulation examples. At the end, we present one interesting application of our theoretical study to the ground sensor selection problem. For an inference/estimation task, we prove that sensors with randomized quality outperform those with homogeneous precisions, since random sensors yield a higher confidence level of estimation (lower variance), under certain technical assumptions. This finding might have practical implications on the design of UAV-UGS systems.by Fangchang Ma.S.M

    Algorithms for single-view depth image estimation

    No full text
    This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: Ph. D. in Autonomous Systems, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 143-158).Depth sensing is fundamental in autonomous navigation, localization, and mapping. However, existing depth sensors offer many shortcomings, especially low effective spatial resolutions. In order to attain enhanced resolution with existing hardware, this dissertation studies the single-view depth estimation problem - the goal is to reconstruct the dense and complete 3D structures of the scene, given only sparse depth measurements. To this end, this thesis proposes three different algorithms for depth estimation. The first contribution is an algorithm for efficient reconstruction of 3D planar surfaces. This algorithm assumes that the 3D structure is piecewise-planar, and thus the second-order derivatives of the depth image are sparse. We develop a linear programming problem for recovery of the 3D surfaces under such assumptions, and provide conditions under which the reconstruction is exact.This method requires no learning, but still outperforms deep learning-based methods under certain conditions. The second contribution is a deep regression network and a self-supervised learning framework. We formulate the depth completion problem as a pixel-level regression problem and solve it by training a neural network. Additionally, to address the difficulty in gathering ground truth annotations for depth data, we develop a self-supervised framework that trains the regression network by enforcing temporal photometric consistency, using only raw RGB and sparse depth data. The supervised method achieves state-of-the-art accuracy, and the self-supervised approach attains a lower but comparable accuracy. Our third contribution is a two-stage algorithm for a broad class of inverse problems (e.g., depth completion and image inpainting). We assume that the target image is the output of a generative neural network, and only a subset of the output pixels is observed.The goal is to reconstruct the unseen pixels based on the partial samples. Our proposed algorithm first recovers the corresponding low-dimensional input latent vector using simple gradient-descent, and then reconstructs the entire output with a single forward pass. We provide conditions under which the proposed algorithm achieves exact reconstruction, and empirically demonstrate the effectiveness of such algorithms on real data."The work reported in this dissertation was supported in part by the Office of Naval Research (ONR) grant N00014-17-1-2670 and the NVIDIA Corporation"--Page 5.by Fangchang Ma.Ph. D. in Autonomous SystemsPh.D.inAutonomousSystems Massachusetts Institute of Technology, Department of Aeronautics and Astronautic

    Invertibility of convolutional generative networks from partial measurements

    No full text
    The problem of inverting generative neural networks (i.e., to recover the input latent code given partial network output), motivated by image inpainting, has recently been studied by a prior work that focused on fully-connected networks. In this work, we present new theoretical results on convolutional networks, which are more widely used in practice. The network inversion problem is highly non-convex, and hence is typically computationally intractable and without optimality guarantees. However, we rigorously prove that, for a 2-layer convolutional generative network with ReLU and Gaussian-distributed random weights, the input latent code can be deduced from the network output efficiently using simple gradient descent. This new theoretical finding implies that the mapping from the low-dimensional latent space to the high-dimensional image space is one-to-one, under our assumptions. In addition, the same conclusion holds even when the network output is only partially observed (i.e., with missing pixels). We further demonstrate, empirically, that the same conclusion extends to networks with multiple layers, other activation functions (leaky ReLU, sigmoid and tanh), and weights trained on real datasets
    corecore